Correction of rhodopsin serial crystallography diffraction intensities for a lattice-translocation defect

Correction of serial crystallography diffraction intensities for lattice-translocation defects enabled the interpretation of time-resolved serial femtosecond crystallography data probing the first molecular motions in vertebrate vision.


Introduction
The primary event in mammalian vision is the absorption of a photon by 11-cis retinal, which is covalently linked to a lysine side chain in rhodopsin via a protonated Schiff base (Nakanishi, 1991). Upon light absorption, retinal isomerizes to the all-trans form and the receptor transitions through a number of spectroscopically distinct intermediate states (Lewis & Kliger, 1992;Mathies & Lugtenburg, 2000). Once in the active state, the G protein-coupled receptor (GPCR) catalyses the exchange of GDP for GTP in the transducin G protein to initiate intracellular signalling cascades that result in neuronal signalling (Bennett et al., 1982;Emeis et al., 1982).
Structural and spectroscopic studies have exploited the abundance of rhodopsin in bovine retina to investigate how it performs its critical role in vision. The initial resting-state crystal structure of rhodopsin was followed by further crystal and cryo-EM structures of the receptor in the active state, in complex with signalling partners and in cryo-trapped intermediate states following light activation (Palczewski et al., 2000;Standfuss et al., 2011;Tsai et al., 2018Tsai et al., , 2019Nakamichi & Okada, 2006a,b). However, until recently it has not been possible to determine how the structure of the protein changes as a function of time at physiological temperatures.
In the past decade, time-resolved serial femtosecond crystallography (TR-SFX) experiments have shone light on the photochemistry driving the activity of light-sensitive proteins (Poddar et al., 2022). In particular, the molecular mechanisms of several retinal-dependent microbial opsins, including the proton pump bacteriorhodopsin Nass Kovacs et al., 2019;Weinert et al., 2019), the sodium pump KR2 (Skopintsev et al., 2020), the chloride pump NmHR (Yun et al., 2021;Mous et al., 2022) and the C1C2 channelrhodopsin cation channel (Oda et al., 2021), have been deciphered. TR-SFX experiments with rhodopsins typically rely on the delivery of microcrystals embedded in a viscous medium to the interaction region James et al., 2019), where they are illuminated by an optical pulse. After a controlled time delay, the crystals are probed by an X-ray pulse from a free-electron laser (FEL). As only one, still diffraction image can be measured from an individual crystal before it is destroyed by the FEL X-ray pulse , a single data set is typically composed of images collected from tens of thousands of randomly oriented crystals.
While TR-SFX experiments are uniquely capable of visualizing protein conformational changes with atomic spatial and subpicosecond temporal resolution, the sample requirements are often challenging to fulfil. Crystallization in the lipidic cubic phase (LCP) has been the method of choice for most membrane proteins prepared for TR-SFX experiments (Landau & Rosenbusch, 1996), and LCP can also be used as a medium to deliver crystals to the X-ray beam via a highviscosity sample injector .
Optimization of the bulk purification and LCP crystallization protocols for bovine rhodopsin yielded large quantities of well diffracting crystals that were suitable for time-resolved experiments at FELs (Wu et al., 2015;Gruhl et al., 2022).
It was possible to collect diffraction data sets from 'darkstate' crystals, which were not illuminated, and also from crystals 1, 10 and 100 ps after light activation at the Swiss Free Electron Laser (SwissFEL) and the SPring-8 Angstrom Compact Free-Electron Laser (SACLA) (Gruhl et al., 2022; Table 1). An initial model of the dark-state structure could be obtained by molecular replacement, after which several iterations of model building and refinement were carried out. Inspection of the electron-density maps revealed significant density features in the solvent channels that could not be successfully modelled by placing solvent molecules.
After in-depth analysis of the electron-density maps and diffraction intensities, we were able to identify the presence of a lattice-translocation defect within the rhodopsin crystals. Procedures to correct rotation data collected from single crystals exhibiting such defects have been described previously (Wang, Kamtekar et al., 2005;Wang, Rho et al., 2005). Here, we describe the procedure performed to detect, characterize and correct serial crystallography data sets for this pathology. The correction was essential to confidently model both the dark and light-activated structures of rhodopsin, and therefore to elucidate the first molecular rearrangements in vertebrate vision.

Data collection
Bovine rhodopsin was purified from the native source and crystallized in LCP as has been described in detail (Gruhl et al., 2022). The microcrystals embedded in LCP were delivered to the X-ray beam by a high-viscosity sample injector (Weierstall et al., 2014) and diffraction data were collected from crystals either without illumination by the pump laser (dark sample) or at a defined time delay after illumination. Data were collected as still images at SwissFEL and SACLA, with separate dark-state data sets measured at both FELs. Experimental parameters are detailed in Gruhl et al. (2022).
All data were integrated using indexamajig in CrystFEL (White et al., 2012). The integration radius was set to two pixels for SwissFEL data and three pixels for SACLA data, while the background annulus was set to between four and six pixels for SwissFEL data and between four and seven pixels for data collected at SACLA. The crystal-to-detector distance was optimized by sampling detector distances between 91.5 and 97.5 mm at SwissFEL and 47.5 and 53.5 mm at SACLA first in 200 mm increments and then in 20 mm increments to determine the detector distance at which the standard deviations of the unit-cell parameters were minimized.
Initial phasing by molecular replacement was performed with Phaser (McCoy et al., 2007) using a deposited structure of Table 1 Merging statistics for dark-state data sets collected at SwissFEL and SACLA (Gruhl et al., 2022).  et al., 2004), with solvent and ligand molecules removed, as a search model. The asymmetric unit contains two rhodopsin molecules arranged as an antiparallel dimer. Rotation of this antiparallel dimer around the crystallographic screw axes generates translational noncrystallographic symmetry (tNCS), which was detected as a peak with a magnitude 65% of the origin peak height at position 0.5a + 0.378b + 0.5c in the native Patterson map with phenix.xtriage (Liebschner et al., 2019). Multiple iterations of model building in Coot (Casañ al et al., 2020) and refinement with phenix.refine (Liebschner et al., 2019) were carried out until convergence of the R free statistics. Most metrics of model quality were reasonable at this stage (see Tables 2 and 3). In particular, model refinement against dark-state data collected at SwissFEL yielded R work and R free values of 26.65% and 28.27%, respectively, in the resolution range 16.1-1.8 Å , showing that the modelled structure explained the experimental observations well.

Indications of lattice-translocation disorder in real space
Although the R-factor statistics, which quantify the agreement between the observed and model structure factors, converged to values which are commonly considered to be acceptable in serial crystallography, inspection of the electron-density maps indicated that our model did not appropriately describe the underlying data. As is common practice, we calculated Aweighted electron-density difference maps, where the Fourier coefficients (mF o À DF c )exp(i' c ), with observed structurefactor amplitudes F o and model amplitudes and phases F c , ' c , include weighting factors m and D to account for the coordinate errors within the model and the partiality of the modelled structure (Read, 1986). These maps are less affected by model bias compared with (F o À F c )exp(i' c ) maps and are particularly useful to identify errors in the model and aid model building. In particular, positive difference peaks correspond to features of the electron density which are unaccounted for by the model.
Significant positive difference electron-density features were visible both in the solvent channels between rhodopsin molecules and overlapping with the rhodopsin dimer ( Fig. 1). This difference density could not be sensibly modelled by the placement of solvent molecules. Furthermore, some of the observed features resembled amino-acid side chains (Fig. 2a). However, there was insufficient space between the protein molecules to accommodate an additional rhodopsin chain.
We were able to model a single tryptophan residue into a particularly prominent positive difference electron-density feature in the solvent channel (Fig. 2a) and then to leastsquares align tryptophan-centred tripeptides from the rhodopsin structure to the placed tryptophan residue in Coot (Fig. 2b). In this way, we could assign this unexplained density feature to Trp265. Least-squares alignment of the entire rhodopsin dimer to this tryptophan residue resulted in the model overlapping well with the unexplained density (Fig. 2c). This second copy of the dimer was related to the original structure by a translation of 22.5 Å along the unit-cell b axis, corresponding to approximately one-quarter of the length of the b axis. As the original dimer and the translated copy spatially overlap, it is not possible for both copies to be present in the same unit cell.
This observation was reminiscent of several previous observations of lattice-translocation disorder (LTD), namely the presence of translation-related domains in the crystals, in a number of systems (Bragg & Howells, 1954;Pickersgill, 1987;Wang, Rho et al., 2005;Hare et al., 2009;Tsai et al., 2009;Ponnusamy et al., 2014;Li et al., 2020). We therefore sought to determine whether this was another such case and whether the data could be corrected as previously described in Wang, Kamtekar et al. (2005).
As a first step, a Fourier transform of the merged intensities was performed to produce a Patterson map, which represents the autocorrelation function of the electron density in real space and exhibits prominent peaks at positions corresponding to highly frequent interatomic and intermolecular vectors in the structure. By inspecting this map, we observed a prominent peak at the position 0.5a + 0.378b + 0.5c, where   a, b, c are the unit-cell axes, which was accounted for by the noncrystallographic symmetry operation relating the two copies of the molecule in the asymmetric unit. We also observed a second prominent peak, with a magnitude of 18% of the origin peak for the SACLA data and 25% of the origin peak for the SwissFEL data, at 0.245b (Fig. 3). We attributed this peak to the vector t d relating the two translation-related domains.

Indications of lattice-translocation disorder in reciprocal space
We assume that terms from translation-related domains corresponding to integer multiples of the fundamental translation t d contribute to the structure factors of an LTD-affected crystal. Following and generalizing the treatment of LTD presented in Wang, Kamtekar et al. (2005), we model such structure factors with the weighted sum of an infinite number of translation-related terms, where h denotes the reciprocal-lattice point ha* + kb* + lc* with integer h, k, l and reciprocal-lattice basis vectors a*, b*, c*. Real-space domain translations are given by the set {nt d } of integer multiples of the fundamental translation t d , and the weights n represent the unit-cell fractions pertaining to each domain. The single-domain structure factor F d is given by the usual sum over contributions from all atoms within one unit cell, where f j is the atomic form factor and x j is the position of atom j in the unit cell. The structure factor of a translated domain is Inspection of the Patterson map provides the estimate t d = t d b = 0.245b. Because high-order terms are progressively less relevant, we truncate the summation in equation (1)    Rhodopsin dimer (pink) and symmetry-related rhodopsin molecules (pale orange) shown in cartoon format with A -weighted F o À F c difference density contoured at 4.0 overlaid (green). The unit cell is shown as a green box.
As a result of squaring the complex structure factors to obtain diffraction intensities I o ¼ F o F Ã o , interference terms ( 0 1 + 1 2 )[exp(i2t d k) + exp(Ài2t d k)] and 0 2 [exp(i4t d k) + exp(Ài4t d k)] appear. This gives rise to a characteristic oscillatory behaviour of the observed intensities as a function of reciprocal space index k, I o ðhÞ ' ½ 2 0 þ 2 1 þ 2 2 þ 2ð 0 1 þ 1 2 Þ cosð2t d kÞ þ 2 0 2 cosð4t d kÞI d ðhÞ; with This oscillatory behaviour can be recognized in Fig. 4, where measured intensities, averaged over reciprocalspace indices (h, l), are shown as a function of index k.

Correction of diffraction intensities
The inspection and analysis of the electron-density maps, the Patterson map and the merged intensities supported the diagnosis of LTD within the rhodopsin crystals. Tentative domain fractions 0 , 1 and 2 were sampled with steps of 0.01 in the range [0, 1]. For each triplet ( 0 , 1 , 2 ) satisfying with 0 0 1 2 1; we estimated individual domain intensities by inversion of equation (5) as follows:   Patterson map values for SwissFEL data (a) and SACLA data (b) along the direction 0â a þ yb b þ 0ĉ c, whereâ a;b b;ĉ c are unit vectors aligned with the corresponding unit-cell basis vectors before and after correction of the intensities with 0 = 0.01, 1 = 0.19, 2 = 0.80 (SwissFEL) and 0 = 0.00, 1 = 0.15, 2 = 0.85 (SACLA). The peak height at 0.245b is equivalent to 25% and 18% of the origin peak heights for uncorrected SwissFEL and SACLA data, respectively.
triplet resulting in small Patterson peaks at both examined positions. We obtained 0 = 0.01, 1 = 0.19 and 2 = 0.80 for the SwissFEL data set and 0 = 0.00, 1 = 0.15 and 2 = 0.85 for the SACLA data set. Fig. 3 shows a comparison of the Patterson map values before and after correction along the direction of the unit-cell axis b. Upon correction of the intensities using the optimized values, a dampening of the oscillations of (h, l)averaged values as a function of index k was observed (Fig. 4).
With the definitions equation (5) can be rewritten with weights 0 = 0.6762, 1 = 0.3078, 2 = 0.016 for the SwissFEL data and 0 = 0.745, 1 = 0.255, 2 = 0 for the SACLA data. The ratios 1 / 0 (0.46 and 0.34 for the SwissFEL and SACLA data, respectively) and 2 / 0 (0.02 and 0 for the SwissFEL and SACLA data, respectively) express the relative importance of interference terms for increasing relative displacement and show that it is well justified to neglect terms with n > 2 in equation (1). Translation correction factors were determined separately for each of the data sets. Refinement of the rhodopsin darkstate atomic model against the LTD-corrected intensities from SwissFEL yielded an immediate improvement in both the R work and R free statistics (21.43% and 24.61%, respectively) and the model geometry. More importantly, the amount of unexplained mF o À DF c difference density was much reduced Absolute values of the Patterson function of corrected SACLA intensities with varying 0 , 1 and 2 = 1 À 0 À 1 at real-space positions t d b (a) and 2t d b (b). Correction of intensities with 0 = 0.00, 1 = 0.15 (black circle) dampens the Patterson peaks at both positions simultaneously.

Figure 5
Absolute values of the Patterson function of corrected SwissFEL intensities with varying 0 , 1 and 2 = 1 À 0 À 1 at real-space positions t d b (a) and 2t d b (b). Correction of intensities with 0 = 0.01, 1 = 0.19 (black circle) dampens the Patterson peaks at both positions simultaneously.
in the resulting electron-density maps. This allowed further model building to remove inappropriately placed solvent molecules, fix distorted amino-acid side-chain rotamers and build more of the flexible loops in the protein. The dark-state final models therefore accounted better for the experimental data. Improved model phases were available for calculation of the light-to-dark electron-density difference maps, with There is a stark contrast between the mF o À DF c maps produced after refinement of the final dark-state model against the diffraction intensities with and without correction, demonstrating the importance of the correction in obtaining an accurate dark-state model (Figs. 7 and 8). As the dark-state model serves as the foundation for interpreting light-induced structural changes, the correction described was required to confidently derive biological insights into the very first processes in receptor activation (Gruhl et al., 2022).

Discussion
Several examples of LTD in crystals of soluble proteins have been published since the first observation in 1954 (Bragg & Howells, 1954). Furthermore, occurrences of LTD may be underreported, as this pathology is difficult to detect and may often be ignored when the translated fraction is small (Lovelace & Borgstahl, 2020). LTD has also been identified in LCP-grown crystals of the rice silicon transporter Lsi1 (Saitoh et al., 2021). Similar to our observation of LTD in rhodopsin crystals, the Lsi1 lattice translation appears to occur parallel to the plane of the membrane. Indeed, lattice translations perpendicular to the plane of the LCP bilayer seem improbable as the hydrophobic transmembrane region of the protein would be displaced from the hydrophobic lipid environment to the aqueous environment between the lipid bilayers. Instead, type I LCP crystals, which assemble as stacks of 2D crystals, appear to be susceptible to translations in the relative positions of these 2D layers, which result in translations parallel to the plane of the membrane.
In the previously observed cases of LTD the data were collected using the rotation method (Dauter, 1999) and it was possible to diagnose the disorder by direct inspection of the diffraction images. In Wang, Kamtekar et al. (2005), for example, a translation of 0.5 in the c axis in Bacillus phage '29 DNA polymerase crystals resulted in the appearance of weak and streaky spots for reflections with odd values of l and strong spots for even values of l, caused by the main lattice and the translated lattice scattering out of phase and in phase with odd and even values of l, respectively. Such an analysis is not possible with serial crystallography data, as the measured intensities from individual frames are partial and the intensity of a given reflection cannot be determined by inspection of a single diffraction image. Instead, it is necessary to search for non-origin peaks in the Patterson map that are too close to the origin to be explained by tNCS.
Despite working with a single protein purification and crystallization protocol, we observed some variability in the determined values of the translated domain fractions research papers depending on the examined data set. While data sets from the SwissFEL beamtime showed 0 and 1 values close to 0.01 and 0.19, respectively, we found that data sets collected during a previous beam time at SACLA had lower translated domain fractions, with values of 0 = 0.00 and 1 = 0.15. The unexplained difference density was therefore less prominent after model refinement using uncorrected intensities from the SACLA data set. Although the crystalline sample was prepared using the same protocol for both beam times, subtle differences in the sample preparation such as the crystallization temperature or precipitant composition may affect the probability of a lattice translocation occurring during crystal growth. Such a variation in the translated fraction has previously been observed in crystals of lentiviral integrase in complex with LEDGF and was ascribed to differences in crystal-growth conditions (Hare et al., 2009). It is also possible that the smaller beam size used at SACLA, 1 Â 1 mm compared with 5 Â 5 mm at SwissFEL, reduced the chance of collecting data from a region of the crystal containing a latticetranslocation defect.
The translated domain fractions were found to be similar for all data sets collected during the same beam time. This is to be expected as both dark and light data sets were collected from the same sample batches and the translated domain fractions are the result of an average over all diffracting crystals contributing to each data set.
We attempted the refinement of a composite atomic model containing the main dimer at occupancy (1 À ) and a second model of the dimer with occupancy , translated from the main model by t d , against the uncorrected data, as was previously successful for the 1918 H1N1 neuraminidase (Zhu et al., 2008). The introduction of an additional copy of the dimer during refinement with our serial crystallography rhodopsin data necessitates the use of strict NCS constraints to avoid significantly worsening the data-to-parameter ratio. However, the refinement of a composite model against uncorrected data was deemed impractical with bovine rhodopsin, as interpretation of the 2mF o À DF c electron-density maps was challenging in the region where the two models overlapped. As previously observed by Zhu et al. (2008), we found that correction of the data for LTD was required to model solvent molecules in the overlapped region accurately. The positions of ordered water molecules are of key interest in studies of receptor dynamics as they often stabilize hydrogen-bond networks in the transitions between conformational states.
We did not observe dramatic changes in the time-resolved F light o À F dark o difference maps as a result of the correction (Fig. 9). This is because the light and dark data sets were collected from the same sample using an interleaved sequence of visible light laser pulses to pump the sample and X-rays to probe the structure . The systematic error due to LTD is therefore very similar in both data sets, usually with less than 1% difference in the translated populations. Instead, the most significant effect of LTD was to increase the number of poorly modelled regions in the dark-state models. As the dark-state model was taken as a starting point for the refinement of light-state structures into the extrapolated maps generated from the light-activated data sets, correction for LTD improved our models of both the dark and lightactivated rhodopsin structures.

Conclusions
TR-SFX experiments provide unprecedented opportunities to understand ultrafast biological processes, including the photoisomerization of retinal in rhodopsin, which is the first event in mammalian vision. Having overcome many of the challenging barriers to performing a TR-SFX experiment with rhodopsin, we identified the LTD in the crystals at the data-processing stage. The described correction facilitated the interpretation of the data collected during these time-and resource-intensive experiments (Gruhl et al., 2022). Given the demanding pressures on sample preparation and beam time, it is often not possible to repeat an experiment after optimization of the crystals to avoid the LTD.
Here, we show the identification and characterization of LTD in a serial crystallography experiment. As these experiments, including time-resolved studies, become more widespread, it is likely that the present method of LTD correction will be of choice.